Noise signal analyzer, noise signal synthesizer, noise signal analyzing method, and noise signal synthesizing method

ABSTRACT

FFT section  102  transforms a windowed input noise signal into a frequency spectrum. Spectral model storing section  103  stores model information on spectral models. Spectral model series calculating section  104  calculates spectral model number series corresponding to amplitude spectral series of the input noise signal, using the model information stored in spectral model storing section  103 . Duration model/transition probability calculating section  105  outputs model parameters using the spectral model number series calculated in spectral model series calculating section  104.  It is thereby possible to synthesize a background noise with perceptual high quality.

Technical Field

[0001] The present invention relates to a noise signal analysisapparatus and synthesis apparatus for analyzing and synthesizing abackground noise signal superimposed on a speech signal, and to a speechcoding apparatus for coding the speech signal using the analyzingapparatus and synthesis apparatus.

BACKGROUND ART

[0002] In fields of mobile communications and speech storage, foreffective utilization of radio signals and storage media, a speechcoding apparatus is used that compresses speech information to encode atlow bit rates. As a conventional technique in such a speech codingapparatus, there is a CS-ACELP coding scheme with DTX (DiscontinuousTransmission) control of ITU-T Recommendation G.729, Annex B (“A silencecompression scheme for G.729 optimized for terminals conforming toRecommendation V.70”).

[0003] FIG.1 is a block diagram illustrating a configuration of a speechcoding apparatus using the conventional CS-ACELP coding scheme with DTXcontrol. In FIG.1 an input speech signal is input to speech/non-speechdeterminer 11, CS-ACELP speech coder 12 and non-speech interval coder13. First, speech/non-speech determiner 11 determines whether the inputspeech signal is of a speech interval or of a non-speech interval(interval with only a background noise).

[0004] When speech/non-speech determiner 11 determines that the signalis of a speech interval, CS-ACELP speech coder 12 performs speech codingon the signal of the speech interval. Coded data of the speech intervalis output to DTX control/multiplexer 14.

[0005] Meanwhile, when speech/non-speech determiner 11 determines thatthe signal is of a non-speech interval, non-speech interval coder 13performs coding on the noise signal of the non-speech interval. Usingthe input speech signal, non-speech interval coder 13 calculates LPCcoefficients the same as in coding of speech interval and LPC predictionresidual energy of the input speech signal to output to DTXcontrol/multiplexer 14 as coded data of the non-speech interval. Inaddition, the coded data of the non-speech interval is transmittedintermittently at an interval at which a predetermined change incharacteristics (LPC coefficients or energy) of the input signal isdetected.

[0006] DTX control/multiplexer 14 controls and multiplexes data to betransmitted as transmit data, and outputs the resultant as transmitdata, using outputs from speech/non-speech determiner 11, CS-ACELPspeech coder 13 and non-speech interval coder 13.

[0007] The conventional speech coder as described above has the effectof decreasing an average bit rate of transmit signals by performingcoding only at a speech interval of an input speech signal using aCS-ACELP speech coder, while at a non-speech interval (interval withonly noise) of the input speech signal, performing coding intermittentlyusing a dedicated non-speech interval coder with a number of bits fewerthan in the speech coder.

[0008] However, in the above-mentioned conventional speech codingmethod, due to facts as described below, a receiving-side apparatus thatreceives data coded in a transmitting-side apparatus has a problem thatthe quality of a decoded signal corresponding to a noise signal at anon-speech interval deteriorates. That is, a first fact is that thenon-speech interval coder (noise signal analyzing/coding section) in thetransmitting-side apparatus performs coding with the same signal modelas in the speech coder (generates a decoded signal by applying an ARtype of synthesis filter (LPC synthesis filter) to a noise signal pershort-term (approximately 10 to 50 ms) basis).

[0009] A second factor is that the receiving-side apparatus synthesizes(generates) a noise using the coded data obtained by intermittentlyanalyzing an input noise signal in the transmitting-side apparatus.

DISCLOSURE OF INVENTION

[0010] It is an object of the present invention to provide a noisesignal synthesis apparatus capable of synthesizing a background noisesignal with perceptually high quality.

[0011] The object is achieved by representing a noise signal withstatistical models. Specifically, using a plurality of stationary noisemodels representative of an amplitude spectral time series following astatistical distribution with a duration of the amplitude spectral timeseries following another statistical distribution, a noise signal isrepresented as a spectral series statistically transiting between thestationary noise models.

BRIEF DESCRIPTION OF DRAWINGS

[0012]FIG. 1 is a block diagram illustrating a configuration of a codingapparatus using a conventional CS-ACELP coding scheme with DTX control;

[0013]FIG. 2 is a block diagram illustrating a configuration of a noisesignal analysis apparatus according to a first embodiment of the presentinvention;

[0014]FIG. 3 is a block diagram illustrating a configuration of a noisesignal synthesis apparatus according to the first embodiment of thepresent invention;

[0015]FIG. 4 is a flow diagram showing the operation of the noise signalanalysis apparatus according to the first embodiment of the presentinvention;

[0016]FIG. 5 is a flow diagram showing the operation of the noise signalsynthesis apparatus according to the first embodiment of the presentinvention;

[0017]FIG. 6 is a block diagram illustrating a configuration of a speechcoding apparatus according to a second embodiment of the presentinvention;

[0018]FIG. 7 is a block diagram illustrating a configuration of a speechdecoding apparatus according to the second embodiment of the presentinvention;

[0019]FIG. 8 is a flow diagram showing the operation of the speechcoding apparatus according to the second embodiment of the presentinvention;

[0020]FIG. 9 is a flow diagram showing the operation of the speechdecoding apparatus according to the second embodiment of the presentinvention;

[0021]FIG. 10 is a block diagram illustrating a configuration of a noisesignal analysis apparatus according to a third embodiment of the presentinvention;

[0022]FIG. 11 is a block diagram illustrating a configuration of aspectral model parameter calculating/quantizing section according to thethird embodiment of the present invention;

[0023]FIG. 12 is a block diagram illustrating a configuration of a noisesignal synthesis apparatus according to the third embodiment of thepresent invention;

[0024]FIG. 13 is a flow diagram showing the operation of the noisesignal analysis apparatus according to the third embodiment of thepresent invention;

[0025]FIG. 14 is a flow diagram showing the operation of the spectralmodel parameter calculating/quantizing section according to the thirdembodiment of the present invention;

[0026]FIG. 15 is a flow diagram showing the operation of the noisesignal synthesis apparatus according to the third embodiment of thepresent invention;

[0027]FIG. 16 is a block diagram illustrating a configuration of aspeech coding apparatus according to a fourth embodiment of the presentinvention;

[0028]FIG. 17 is a block diagram illustrating a configuration of aspeech decoding apparatus according to the fourth embodiment of thepresent invention;

[0029]FIG. 18 is a flow diagram showing the operation of the speechcoding apparatus according to the fourth embodiment of the presentinvention; and

[0030]FIG. 19 is a flow diagram showing the operation of the speechdecoding apparatus according to the fourth embodiment of the presentinvention.

BEST MODE FOR CARRYING OUT THE INVENTION

[0031] Embodiments of the present invention will be described below withreference to accompanying drawings.

First Embodiment

[0032] In the present invention, a noise signal is represented withstatistical models. That is, using a plurality of stationary noisemodels representative of an amplitude spectral time series following astatistical distribution with a duration of the amplitude spectral timeseries following another statistical distribution, a noise signal isrepresented as a spectral series statistically transiting between thestationary noise models.

[0033] More specifically, a stationary noise spectrum is represented byamplitude spectral time series {Si(n)} (n=1, . . . , Li, i=1, . . . , M)with M spectral models. Li indicates a duration (herein unit time is ofa number of frames) of each amplitude spectral time series {Si(n)}. Itis assumed that each of {Si(n)} and Li follows a statisticaldistribution indicated by normal distribution. Then, a background noiseis represented as a spectral series transiting between the spectral timeseries models {Si(n)} with a transition probability of p(i,j) (i,j=1, .. . , M).

[0034]FIG. 2 is a block diagram illustrating a configuration of a noisesignal analysis apparatus according to the first embodiment of thepresent invention. In the noise signal analysis apparatus illustrated inFIG. 2, with respect to input noise signal x(j) (j=0, . . . , N−1; N:analysis length) corresponding to m-th frame (m=0,1,2, . . . ) input foreach predetermined interval (hereinafter referred to as “frame”),windowing section 101 performs windowing, for example, using a Hanningwindow. FFT (Fast Fourier Transform) section 102 transforms the windowedinput noise signal into a frequency spectrum, and calculates inputamplitude spectrum X(m) of the m-th frame.

[0035] Using model information on spectral model Si (i=1, . . . , M)stored in spectral model storing section 103, spectral model seriescalculating section 104 calculates spectral model number series{index(m)} (1≦index(m)≦M, m=0,1,2, . . . ) corresponding to amplitudespectral series {X(m)} (m=0,1,2, . . . ) of the input noise signal. Themodel information on spectral model Si (i=1, . . . , M) includes averageamplitude Sav_i and standard deviation Sdv_i that are statisticalparameters of Si. It is possible to prepare those in advance bylearning. The corresponding spectral number model series is calculatedby obtaining number i of spectral model Si having average amplitudeSav_i such that the distance from input amplitude spectrum X(m) is theleast.

[0036] Using spectral model number series {index(m)} obtained inspectral model series calculating section 104, duration model/transitionprobability calculating section 105 calculates statistical parameters(average value Lav_i and standard deviation Ldv_i of Li) concerningnumber-of-successive frames Li corresponding to each Si and transitionprobability p(i,j) between Si and Sj to output as model parameters ofthe input noise signal. In addition, these model parameters arecalculated and transmitted at predetermined intervals or at arbitraryintervals.

[0037]FIG. 3 is a block diagram illustrating a configuration of a noisesignal synthesis apparatus according to the first embodiment of thepresent invention. In the noise signal synthesis apparatus illustratedin FIG. 3, using transition probability p(i,j) between Si and Sj amongmodel parameters (average value Lav_i and standard deviation Ldv_i of Liand transition probability p(i,j) between Si and Sj) obtained in thenoise signal analysis apparatus illustrated in FIG. 2, generated isspectral model number transition series {index′ (l)} (1≦index′(l)≦M,l=0,1,2, . . . ) such that the transition of spectral model Si becomesgiven transition probability p(i,j).

[0038] Using model number index′ (l) obtained in transition seriesgenerating section 201 and the model information (average amplitudeSav_i and standard deviation Sdv_i of Si) on spectral model Si (i=1, . .. , M) stored in spectral model storing section 202, spectrum generatingsection 205 generates amplitude spectral time series {X′ (n)}, indicatedin the following equation, corresponding to index (1):

{x′(n)}={S_(index′(1))(n)}, n=1,2, . . . , L  (1)

[0039] Herein, it is assumed that S_(index′(1)) follows a normaldistribution with average amplitude Sav_i and standard deviation Sdv_iwith respect to i=index′ (l), and number-of-successive frames L iscontrolled in duration control section 203 to follow a normaldistribution with average value Lav_i and standard deviation Ldv_i withrespect to i=index′ (l), using statistical model parameters (averagevalue Lav_i and standard deviation Ldv_i of Li) of number-of-successiveframes Li corresponding to spectral model Si output from the noisesignal analysis apparatus.

[0040] Further, according to the above method, spectrum generatingsection 205 adds random phases generated in random phase generatingsection 204 to the amplitude spectral time series with a predeterminedtime duration (a number of frames) generated according to transitionseries {index′ (l)} to generate a spectral time series. In addition,spectrum generating section 205 may perform smoothing on the generatedamplitude spectral time series so that the spectrum varies smoothly.

[0041] IFFT (Inverse Fast Fourier Transform) section 206 transforms thespectral time series generated in spectrum generating section 205 into awaveform of time domain. Overlap adding section 207 superimposesoverlapping signals between frames, and thereby outputs a finalsynthesized noise signal.

[0042] Operations of the noise signal analysis apparatus and noisesignal synthesis apparatus with the above configurations will bedescribed below with reference to FIGS. 4 and 5. FIG. 4 is a flowdiagram showing the operation of the noise signal analysis apparatusaccording to the first embodiment of the present invention. FIG. 5 is aflow diagram showing the operation of the noise signal synthesisapparatus according to the first embodiment of the present invention.

[0043] First, the operation of the noise signal analysis apparatusaccording to this embodiment will be described with reference to FIG. 4.In step (hereinafter referred to as “ST”) 301, noise signal x(j) (j=0, .. . , N−1; N: analysis length) for each frame is input to windowingsection 101. In ST302 windowing section 101 performs windowing, forexample, using a Hamming window, on the input noise signal correspondingto m-th frame (m=0,1,2, . . . ). In ST303 FFT section 102 performs FFT(Fast Fourier Transform) on the windowed input noise signal to transforminto a frequency spectrum. Input amplitude spectrum X(m) of the m-thframe is thereby calculated.

[0044] In ST304, using model information on spectral model Si(i=1, . . ., M), spectral model series calculating section 104 calculates spectralmodel number series {index(m) } (1≦index(m)≦M, m=0,1,2, . . . )corresponding to amplitude spectral series {X(m)} (m=0,1,2, . . . ) ofthe input noise signal.

[0045] The model information on spectral model Si (i=1, . . . , M)includes average amplitude Sav_i and standard deviation Sdv_i that arestatistical parameters of Si. It is possible to prepare those in advanceby learning. The corresponding spectral number model series iscalculated by obtaining number i of spectral model Si having averageamplitude Sav_i such that the distance from input amplitude spectrumX(m) is the least. The processing of ST301 to ST304 is performed foreach frame.

[0046] In ST305, using spectral model number series {index(m)} obtainedin ST304, duration model/transition probability calculating section 105calculates statistical parameters (average value Lav_i and standarddeviation Ldv_i of Li) concerning number-of-successive frames Licorresponding to each Si and transition probability p(i,j) between Siand Sj. In ST306, these values are output as model parameterscorresponding to input noise signal. In addition, these parameters arecalculated and transmitted at predetermined intervals or at arbitraryintervals.

[0047] The operation of the noise signal analysis apparatus according tothis embodiment will be described with reference to FIG. 5. First inST401, model parameters (average value Lav_i and standard deviationLdv_i of Li and transition probability p(i,j) between Si and Sj)obtained in the noise signal analysis apparatus are input to transitionseries generating section 201 and duration control section 203.

[0048] In ST402, using transition probability p(i,j) between Si and Sjamong the input model parameters, transition series generating section201 generates spectral model number transition series {index′ (l)}(1≦index′ (l)≦M, l=0,1,2, . . . ) such that the transition of spectralmodel Si becomes given transition probability p(i,j).

[0049] In ST403, using statistical model parameters (average value Lav_iand standard deviation Ldv_i of Li) of number-of-successive frames Licorresponding to spectral model Si among the input model parameters,duration control section 203 generates number-of-successive frames Lcontrolled to follow a normal distribution with average value Lav_i andstandard deviation Ldv_i with resect to i=index′ (l). In ST404 randomphase generating section 204 generates random phases.

[0050] In ST405, using model number index′ (1) obtained in ST402 andmodel information (average amplitude Sav_i and standard deviation Sdv_iof Si) on spectral model Si (i=1, . . . , M) that is prepared inadvance, spectrum generating section 205 generates amplitude spectraltime series {X′ (n)}, indicated in equation (1), corresponding to index′(l). In addition, spectrum generating section 205 may perform smoothingon the generated amplitude spectral time series so that the spectrumvaries smoothly.

[0051] Herein, it is assumed that S_(index′ (1)) follows a normaldistribution with average amplitude Sav_i and standard deviation Sdv_iwith respect to i=index′ (l), and number-of-successive frames L isgenerated in ST404.

[0052] Further, the amplitude spectral time series with a predeterminedtime duration (a number of frames) generated according to transitionseries {index′ (l)} is given random phases generated in ST404, andthereby the spectral time series is generated.

[0053] In ST406 IFFT section 206 transforms the generated spectral timeseries into a waveform of time domain. In ST407 overlap adding section207 superimposes over lapping signals between frames. In ST408 the superimposed signal is output as a final synthesized noise signal.

[0054] Thus, in this embodiment, a background noise is represented withstatistical models. In other words, using a noise signal, the noisesignal analysis apparatus (transmitting-side apparatus) generatesstatistical information (statistical model parameters) includingspectral variations in the noise signal spectrum, and transmits thegenerated information to a noise signal synthesis apparatus(receiving-side apparatus). Using the information (statistical modelparameters) transmitted from the noise signal analysis apparatus(transmitting-side apparatus), the noise signal synthesis apparatus(receiving-side apparatus) synthesizes a noise signal. In this way, thenoise signal synthesis apparatus (receiving-side apparatus) is capableof using statistical information including spectral variations in thenoise signal spectrum, instead of using a noise signal spectrum analyzedintermittently, to synthesize a noise signal, and thereby is capable ofsynthesizing a noise signal with less perceptual deterioration.

[0055] In addition, while this embodiment explains the above contentsusing a noise signal analysis apparatus and synthesis apparatus withconfigurations illustrated respectively in FIGS. 2 and 3 and a noisesignal analysis method and synthesis method shown respectively in FIGS.4 and 5, it may be possible to achieve the above contents with anothermeans without departing from the spirit of the present invention. Forexample, while it is explained in the above embodiment that as spectralmodel information, statistical models (average and standard deviation ofS) of spectrum S is prepared in advance by learning, it may be possibleto learn on real time an input noise signal or quantize with spectralrepresentative parameters such as LPC coefficients, to transmit to asynthesizing side. Further, it may be possible to prepare patterns ofstatistical parameters (average Lav and standard deviation Ldv of L) ofspectral duration and statistical transition parameters between spectralmodels Si, select an appropriate one from the patterns corresponding toinput noise signal during a predetermined period to transmit, and basedon the pattern, synthesize a noise signal.

Second Embodiment

[0056] This embodiment explains a case where a speech coding apparatusis achieved using the noise signal analysis apparatus as described inthe first embodiment, and a speech decoding apparatus is achieved usingthe noise signal synthesis apparatus as described in the firstembodiment.

[0057] The speech coding apparatus according to this embodiment will bedescribed below with reference to FIG. 6. FIG. 6 is a block diagramillustrating a configuration of the speech coding apparatus according tothe second embodiment of the present invention. In FIG. 6 an inputspeech signal is input to speech/non-speech determiner 501, speech coder502 and noise signal coder 503.

[0058] Speech/non-speech determiner 501 determines whether the inputspeech signal is of a speech interval or non-speech interval (intervalwith only a noise), and outputs a determination. Speech/non-speechdeterminer 501 may be an arbitrary one, and in general, one usingmomentary amounts, variation amounts or the like of a plurality ofparameters such as power, spectrum and pitch period of the input signalto make a determination.

[0059] When speech/non-speech determiner 501 determines that the inputspeech signal is of speech, speech coder 502 performs speech coding onthe input speech signal, and outputs coded data to DTXcontrol/multiplexer 504. Speech coder 502 is one for speech interval,and is an arbitrary coder that encodes speech with high efficiency.

[0060] When speech/non-speech determiner 501 determines that the inputspeech signal is of non-speech, noise signal coder 503 performs noisesignal coding on the input speech signal, and outputs model parameterscorresponding to the input noise signal. Noise signal coder 503 isobtained by adding a configuration for outputting coded parameterresulting from the quantization and coding of output model parameters tothe noise signal analysis apparatus (see FIG. 2) as described in thefirst embodiment.

[0061] Using outputs from speech/non-speech determiner 501, speech coder502 and noise signal coder 503, DTX control/multiplexer 504 controlsinformation to be transmitted as transmit data, multiplexes transmitinformation, and outputs the transmit data.

[0062] The speech decoding apparatus according to the second embodimentof the present invention will be described below with reference to FIG.7. FIG. 7 is a block diagram illustrating a configuration of the speechdecoding apparatus according to the second embodiment of the presentinvention. In FIG. 7 transmit data transmitted from the speech codingapparatus illustrated in FIG. 6 is input to demultiplexing/DTXcontroller 601 as received data.

[0063] Demultiplexing/DTX controller 601 demultiplexes the received datainto speech coded data or noise model coded parameters and aspeech/non-speech determination flag required for speech decoding andnoise generation.

[0064] When the speech/non-speech determination flag is indicative ofspeech interval, speech decoder 602 performs speech decoding using thespeech coded data, and outputs a decoded speech. When thespeech/non-speech determination flag is indicative of non-speechinterval, noise signal decoder 603 generates a noise signal using thenoise model coded parameters, and outputs the noise signal. Noise signaldecoder 603 is obtained by adding a configuration for decoding inputmodel coded parameters into respective model parameters to the noisesignal synthesis apparatus (FIG. 2) as described in the firstembodiment.

[0065] Output switch 604 switches outputs of speech decoder 602 andnoise signal decoder 603 corresponding to the result ofspeech/non-speech flag to output as an output signal.

[0066] Operations of the speech coding apparatus and speech decodingapparatus with the above configurations will be described below. First,the operation of the speech coding apparatus will be described withreference to FIG. 8. FIG. 8 is a flow diagram showing the operation ofthe speech coding apparatus according to the second embodiment of thepresent invention.

[0067] In ST701 a speech signal for each frame is input. In ST702 theinput speech signal is determined as a speech interval or non-speechinterval (interval with only a noise), and a determination is output.The speech/non-speech determination is made by arbitrary method, and ingeneral, is made using momentary amounts, variation amounts or the likeof a plurality of parameters such as power, spectrum and pitch period ofthe input signal.

[0068] When the speech/non-speech determination is indicative of speechin ST702, in ST703 speech coding is performed on the input speechsignal, and the coded data is output. The speech coding processing iscoding for speech interval and is performed by arbitrary method forcoding a speech with high efficiency.

[0069] Meanwhile, when the speech/non-speech determination is indicativeof non-speech, in ST704 noise signal coding is performed on the inputspeech signal, and model parameters corresponding to the input noisesignal are output. The noise signal coding is obtained by adding stepsfor outputting coded parameter resulting from the quantization andcoding of output model parameters to the noise signal analysis method asdescribed in the first embodiment.

[0070] In ST705 using outputs of speech/non-speech determination, speechcoding and noise signal coding, information to be transmitted astransmit data is controlled (DTX control), and transmit information ismultiplexed. In ST706 the resultant is output as the transmit data

[0071] The operation of the speech decoding apparatus will be describedbelow with reference to FIG. 9. FIG. 9 is a flow diagram showing theoperation of the speech decoding apparatus according to the secondembodiment of the present invention.

[0072] In ST801 transmit data obtained by coding an input signal at acoding side is input as received data. In ST802 the received data isdemultiplexed into speech coded data or noise model coded parameters anda speech/non-speech determination flag required for speech decoding andnoise generation.

[0073] When the speech/non-speech determination flag is indicative ofspeech interval, in ST804 speech decoding is performed using the speechcoded data, and a decoded speech is output. When the speech/non-speechdetermination flag is indicative of non-speech interval, in ST805 anoise signal is generated using the noise model coded parameters, and anoise signal is output. The noise signal decoding processing is obtainedby adding steps for decoding input model coded parameters intorespective model parameters to the noise signal synthesis method asdescribed in the first embodiment.

[0074] In ST806 corresponding to the result of speech/non-speech flag,an output of speech decoding in ST804 or of noise signal decoding inST805 is output as a decoded signal.

[0075] Thus, according to this embodiment, speech coding enabling codingof a speech signal with high quality is performed at a speech interval,while at a non-speech interval, a noise signal is coded and decodedusing a noise signal analysis apparatus and synthesis apparatus withless perceptual deterioration. It is thereby possible to perform codingof high quality even in circumstances with a background noise. Further,since statistical characteristics of a noise signal of an actualsurrounding noise is expected to be constant over a relatively longperiod (for example, a few seconds to a few tens seconds), it issufficient to set a transmit period of model parameters at such a longperiod. Therefore, an information amount of model parameters of a noisesignal to be transmitted to a decoding side is reduced, and it ispossible to achieve efficient transmission.

Third Embodiment

[0076]FIG. 10 is a block diagram illustrating a configuration of a noisesignal analysis apparatus according to the third embodiment of thepresent invention.

[0077] Also in this embodiment, a stationary noise spectrum isrepresented by amplitude spectral time series {Si(n)} (n=1, . . . , Li,i=1, . . . , M) with M models composed of duration (a number of frames)Li (it is assumed that each of {Si(n)} and Li follows a normaldistribution), and a background noise is represented as a spectralseries transiting between the spectral time series models {Si(n)} with atransition probability of p(i,j)(i,j=1, . . . , M).

[0078] In the noise signal analysis apparatus illustrated in FIG. 10,with respect to input noise signal x(j) (j=0, . . . , N−1; N: analysislength) corresponding to m-th frame (m=0,1,2, . . . ) input for eachpredetermined interval (hereinafter referred to as “frame”), windowingsection 101 performs windowing, for example, using a Hanning window. FFT(Fast Fourier Transform) section 902 transforms the windowed input noisesignal into a frequency spectrum, and calculates input amplitudespectrum X(m) of the m-th frame. Spectral model parametercalculating/quantizing section 903 divides amplitude spectral series{X(m)} (m=0,1,2, . . . ) of the input noise signal into intervals with apredetermined number of frames or intervals with a number of framesadaptively determined according to some measure, uses each of theintervals as a unit interval (modeling interval) to model, calculatesand quantizes spectral model parameters at the modeling interval, andoutputs quantized indexes of the spectral model parameters. Further, thesection 903 outputs spectral model number series {index(m)}(1≦index(m)≦M, m=mk, mk+1, mk+2, . . . , mk+NFRM−1; mk is a head framenumber of a modeling interval, and NFRM is the number of frames at themodeling interval) corresponding to amplitude spectral series {X(m)}(m=0,1,2, . . . ) of the input noise signal. The spectral modelparameters include average amplitude Sav_i and standard deviation Sdv_ithat are statistical parameters of spectralmodel Si (i=l, . . . , M). Aconfiguration of spectral model parameter calculating/quantizing section903 will be described specifically later with reference to FIG. 11.

[0079] Using spectral model number series {index(m)} of the modelinginterval obtained in spectral model parameter calculating/quantizingsection 903, duration model/transition probabilitycalculating/quantizing section 904 calculates and quantizes statisticalparameters (duration model parameters) (average value Lav_i and standarddeviation Ldv_i of Li) concerning number-of-successive frames Licorresponding to each Si and transition probability p(i,j) between Siand Sj, and outputs their quantized indexes. While an arbitraryquantizing method is capable of being used, each element of Lav_i, Ldv_iand p(i,j) may undergo scalar-quantization.

[0080] The section 904 outputs the spectral model parameters, durationmodel parameters, and transition probability parameters as statisticalmodel parameter quantized indexes of the input noise signal at themodeling interval.

[0081]FIG. 11 is a block diagram illustrating a specific configurationof spectral model parameter calculating/quantizing section 903. Thesection 903 in this embodiment selects, from among typical vector setsof amplitude spectra representative of noise signals prepared inadvance, a number (M) of models of typical vector suitable forrepresenting the input amplitude spectral time series at the modelinginterval of the input noise, and based on the models, calculates andquantizes spectral model parameters.

[0082] First, with respect to input amplitude spectrum X(m)(m=mk, mk+1,mk+2, . . . , mk+NFRM−1) of unit frame at the modeling interval, powernormalizing section 1002 normalizes the power using power valuesobtained in power calculating section 1001. Clustering section 1004clusters (vector-quantizes) the input amplitude spectra with normalizedpower into clusters each having as a cluster center a respective typicalvector in noise spectral typical vector storing section 1003, andoutputs information indicative of which cluster each of the inputspectra belongs to. It is herein assumed that noise spectral typicalvector storing section 1003 generates, as typical vectors, amplitudespectra of typical noise signals in advance by learning to store, andthat the number of typical vectors is not less than the number (M) ofmodels. Then, among series with cluster (typical vectors) numbers towhich the input spectra belong obtained in clustering section 1004, eachcluster average spectrum calculating section 1005 selects higher-rankedM clusters (a corresponding typical vector is referred to as Ci (i=1,2,. . . M)) in descending order of frequency of belonging at the modelinginterval, and calculates for each cluster an average spectrum of theinput noise amplitude spectrum belonging to each of the clusters toprepare as average amplitude spectra Sav_i (i=1,2, . . . , M) of thespectral models. Further, the section 903 outputs spectral model numberseries {index(m)} (1≦index(m)≦M, m=mk, mk+1, mk+2, . . . , mk+NFRM−1)corresponding to amplitude spectral series {X(m)} of the input noisesignal. The section 903 generates the number series as the number seriesbelonging to higher-ranked M clusters, based on the series of cluster(typical vector) numbers to which the input spectra belong obtained inclustering section 1004. In other words, with respect to frames which donot belong to the higher-ranked M clusters, the section 903 associatesthe frames with numbers of the higher-ranked M clusters according to anarbitrary method (for example, re-clustering or replacing the numberwith a cluster number of a previous frame), or deletes such a frame fromthe series. Then, modeling interval average power quantizing section1006 averages the power values calculated for each frame in powercalculating section 1001 over the entire modeling interval, quantizesthe average power using an arbitrary method such as scalar-quantization,and outputs power indexes and modeling interval average power value(quantized value) E. Error spectrum/power correction value quantizingsection 1007 represents Sav_i as indicated in equation (2) usingcorresponding typical vector Ci, error spectrum di from Ci, modelinginterval average power E and power correction value ei for E of eachspectral model, and quantizes di and ei using an arbitrary method suchas scalar-quantization.

Sav _(—) i=sqrt(E)·ei·(Ci+di) (i=1, . . . , M)  (2)

[0083] It may be possible to quantize error spectrum di by dividing diinto a plurality of bands and performing scalar-quantization on anaverage value of each band. Thus, as quantized indexes of spectral modelparameters, the section 903 outputs M-typical vector indexes obtained ineach-cluster average spectrum calculating section 1005, error spectrumquantized indexes and power correction value quantized indexes obtainedin error spectrum/power correction value quantizing section 1007, andpower quantized indexes obtained in modeling interval average powerquantizing section 1006.

[0084] In addition, as standard deviation Sdv_i among the spectral modelparameters, the section 903 uses an inner-cluster standard deviationvalue corresponding to Ci obtained in learning noise spectral typicalvectors. Storing the value in advance in the noise spectral typicalvector storing section eliminates the need of outputting quantizedindexes. Further, it may be possible that each-cluster average spectrumcalculating section 1005 calculates the standard deviation in thecluster also to quantize in calculating the average spectrum. In thiscase, the section 903 outputs the quantized indexes as part of thequantized indexes of the spectral model parameters.

[0085] In addition, while the above embodiment explains the quantizationof error spectrum using scalar-quantization for each band, it may bepossible to perform another quantization method such asvector-quantization on the entire band. Further, while it is explainedthat the power information is represented by average power of a modelinginterval and correction value for average power for each model, it maybe possible to represent the power information by only the power foreach model or to uses the average power of a modeling interval as powerof all the models.

[0086]FIG. 12 is a block diagram illustrating a configuration of a noisesignal synthesis apparatus according to the third embodiment of thepresent invention. In the noise signal synthesis apparatus illustratedin FIG. 12, using quantized indexes of transition probability p(i,j)between Si and Sj among statistical model parameter quantized indexesobtained in the noise signal analysis apparatus illustrated in FIG. 10,transition series generating section 1101 decodes transition probabilityp(i,j), and generates spectral model number transition series{index′(l)} (1≦index′(l)≦M, l=0,1,2, . . . ) such that the transition ofspectral model Si becomes given transition probability p(i,j). Spectralmodel parameter decoding section 1103 decodes average amplitude Sav_iand standard deviation Sdv_i (i=1, . . . , M) that are statisticalparameters of spectral model Si from quantized indexes of spectral modelparameters. The section 1103 decodes average amplitude Sav_i accordingto equation (2), using quantized indexes obtained in spectral modelparameter calculating/quantizing section 903 in the coding apparatus,and typical vectors in the noise spectral typical vector storingsection, the same as at the coding side, provided in spectral modelparameter decoding section 1103. With respect to standard deviationSdv_i, when using an inner-cluster standard deviation valuecorresponding to Ci obtained in learning noise spectral typical vectorsin the coding apparatus, the section 1103 obtains a corresponding valuefrom noise spectral typical vector storing section 1003 to decode. Usingmodel number index′ (l) obtained in transition series generating section1101 and the model information (average amplitude Sav_i and standarddeviation Sdv_i of Si) on spectral model Si (i=1, . . . , M) obtained inspectral model parameter decoding section 1103, spectrum generatingsection 1105 generates amplitude spectral time series {X′ (n)},indicated in the following equation, corresponding to index′ (l):

{X′(n)}={S_(index′(l))(n)}, n=1,2, . . . , L  (3)

[0087] Herein, it is assumed that S_(index′(l)) follows a normaldistribution with average amplitude Sav_i and standard deviation Sdv_iwith respect to i=index′ (l), and number-of-successive frames L iscontrolled in duration control section 1102 to follow a normaldistribution with average value Lav_i and standard deviation Ldv_i withrespect to i=index′ (l), using decoded values (average value Lav_i andstandard deviation Ldv_i of Li) from

[0088] quantized indexes of statistical model parameters ofnumber-of-successive frames Li corresponding to spectral model Si outputfrom the noise signal analysis apparatus.

[0089] Further, according to the above method, spectrum generatingsection 1105 adds random phases generated in random phase generatingsection 1104 to the amplitude spectral time series with a predeterminedtime duration (=NFRM that is the number of frames of a modelinginterval) generated according to transition series {index′ (l)}, andthereby generates a spectral time series. In addition, spectrumgenerating section 1105 may perform smoothing on the generated amplitudespectral time series so that the spectrum varies smoothly.

[0090] IFFT (Inverse Fast Fourier Transform) section 1106 transforms thespectral time series generated in spectrum generating section 1105 intoa waveform of time domain. Overlap adding section 1107 superimposesoverlapping signals between frames, and thereby outputs a finalsynthesized noise signal.

[0091] Operations of the noise signal analysis apparatus and noisesignal synthesis apparatus with the above configurations will bedescribed below with reference FIGS. 13 to 15.

[0092] First, the operation of the noise signal analysis apparatusaccording to this embodiment will be described with reference to FIG.13. In step (hereinafter referred to as “ST”) 1201, noise signal x(j)(j=0, . . . , N−1; N: analysis length) for each frame is input towindowing section 901. In ST1202 windowing section 901 performswindowing, for example, using a Hanning window, on the input noisesignal corresponding to m-th frame (m=0,1,2, . . . ). In ST1203 FFTsection 902 performs FFT (Fast Fourier Transform) on the windowed inputnoise signal to transform into a frequency spectrum. Input amplitudespectrum X(m) of the m-th frame is thereby calculated. In ST1204spectral model parameter calculating/quantizing section 903 dividesamplitude spectral series {X(m)} (m=0,1,2, . . . ) of the input noisesignal into intervals with a predetermined number of frames or intervalswith a number of frames adaptively determined according to some measure,uses each of the intervals as a unit interval (modeling interval) tomodel, calculates and quantizes spectral model parameters at themodeling interval, and outputs quantized indexes of the spectral modelparameters. Further, the section 903 outputs spectral model numberseries {index(m)}(1≦index(m)≦M, m=mk, mk+1, mk+2, . . . , mk+NFRM−1; mkis a head frame number of a modeling interval, and NFRM is the number offrames at the modeling interval) corresponding to amplitude spectralseries {X(m)} (m=0,1,2, . . . ) of the input noise signal. The spectralmodel parameters include average amplitude Sav_i and standard deviationSdv_i that are statistical parameters of spectral model Si (i=1, . . . ,M). The operation of spectral model parameter calculating/quantizingsection 903 in ST1204 will be described specifically later withreference to FIG. 14.

[0093] In ST1205, using spectral model number series {index(m)} of themodeling interval obtained in ST1204, duration model/transitionprobability calculating/quantizing section 904 calculates and quantizesstatistical parameters (duration model parameters) (average value Lav_iand standard deviation Ldv_i of Li) concerning number-of-successiveframes Li corresponding to each Si and transition probability p(i,j)between Si and Sj, and outputs their quantized indexes. While anarbitrary quantizing method is capable of being used, each element ofLav_i, Ldv_i and p(i,j) may undergo scalar-quantization.

[0094] In ST1206, the above quantized indexes of spectral modelparameters, duration model parameters, and transition probabilityparameters are output as statistical model parameter quantized indexesof the input noise signal at the modeling interval.

[0095]FIG. 14 is a flow diagram showing the specific operation ofspectral model parameter calculating/quantizing section 903 in ST1204 inFIG. 13. The section 903 in this embodiment selects, from among typicalvector sets of amplitude spectra representative of noise signalsprepared in advance, a number (M) of models of typical vector suitablefor representing the input amplitude spectral time series at themodeling interval of the input noise, and based on the models,calculates and quantizes spectral model parameters.

[0096] In ST1301, input amplitude spectrum X(m) (m=mk, mk+1, mk+2, . . ., mk+NFRM−1) of unit frame at the modeling interval is input. In ST1302,power calculating section 1001 calculates power of a frame with respectto the input amplitude spectrum. In ST1303 power normalizing section1002 normalizes the power using power values calculated in powercalculating section 1001. In ST1304 clustering section 1004 clusters(vector-quantizes) input amplitude spectra with normalized power intoclusters each having as a cluster center a respective typical vector innoise spectral typical vector storing section 1003, and outputsinformation indicative of which cluster each of the input spectrabelongs to. In ST1305, among series with cluster (typical vectors)numbers to which the input spectra belong obtained in clustering section1004, each-cluster average spectrum calculating section 1005 selectshigher-ranked M clusters (a corresponding typical vector is referred toas Ci (i=1,2, . . . M)) in descending order of frequency of belonging atthe modeling interval, and calculates for each cluster an averagespectrum of the input noise spectrum belonging to each of the cluster toprepare as average amplitude spectra Sav_i (i=1,2, . . . , M) of thespectral models. Further, the section 903 outputs spectral model numberseries {index(m)} (1≦index(m)≦M, m=mk, mk+1, mk+2, . . . , mk+NFRM−1)corresponding to amplitude spectral series {X(m)} of the input noisesignal. The section 903 generates the number series as the number seriesbelonging to higher-ranked M clusters, based on the series of cluster(typical vector) numbers to which the input spectra belong obtained inclustering section 1004. In other words, with respect to frames which donot belong to the higher-ranked M clusters, the section 903 associatesthe frames with numbers of the higher-ranked M clusters according to anarbitrary method (for example, re-clustering or replacing the numberwith a cluster number of a previous frame), or deletes such a frame fromthe series. In ST1306, modeling interval average power quantizingsection 1006 averages the power values calculated for each frame inpower calculating section 1001 over the entire modeling interval,quantizes the average power using an arbitrary method such asscalar-quantization, and outputs power indexes and modeling intervalaverage power value (quantized value) E. In ST1307 with respect toSav_i, as indicated in equation (2), represented using correspondingtypical vector Ci, error spectrum di from Ci, modeling interval averagepower E and power correction value ei for E of each spectral model,error spectrum/power correction value quantizing section 1007 quantizesdi and ei using an arbitrary method such as scalar-quantization.

[0097] It may be possible to quantize error spectrum di by dividing diinto a plurality of bands and performing scalar-quantization on anaverage value of each band. In ST1308, M-typical vector indexes obtainedin ST1305, error spectrum quantized indexes and power correction valuequantized indexes obtained in ST1307, and power quantized indexesobtained in ST1306 are output as quantized indexes of spectral modelparameters.

[0098] In addition, as standard deviation Sdv_i among the spectral modelparameters, the section 903 uses an inner-cluster standard deviationvalue corresponding to Ci obtained in learning noise spectral typicalvectors. storing the value in advance in the noise spectral typicalvector storing section eliminates the need of outputting quantizedindexes. Further, in ST1305 it may be possible that each-cluster averagespectrum calculating section 1005 calculates the standard deviation inthe cluster also to quantize in calculating the average spectrum. Inthis case, the section 903 outputs the quantized indexes as part of thequantized indexes of the spectral model parameters.

[0099] In addition, while the above embodiment explains the quantizationof error spectrum using scalar-quantization for each band, it may bepossible to perform another quantization method such asvector-quantization on the entire band. Further, while it is explainedthat the power information is represented by average power of a modelinginterval and correction value for average power for each model, it maybe possible to represent the power information by only the power foreach model or to uses the average power of a modeling interval as powerof all the models.

[0100] The operation of the noise signal synthesis apparatus accordingto this embodiment will be described below with reference to FIG. 15. InST1401 respective quantized indexes of statistical model parametersobtained in the noise signal analysis apparatus are input. In ST1402spectral model parameter decoding section 1103 decodes average amplitudeSav_i and standard deviation Sdv_i (i=1, . . . , M) that are statisticalparameters of spectral model Si from quantized indexes of spectral modelparameters. In ST1403, using quantized indexes of transition probabilityp(i,j) between Si and Sj, transition series generating section 1101decodes transition probability p(i,j), and generates spectral modelnumber transition series {index′(l)} (1index′ (l)≦M, 1=0,1,2, . . . )such that the transition of spectral model Si becomes given transitionprobability p(i,j).

[0101] In ST1404, using decoded values (average value Lav_i and standarddeviation Ldv_i of Li) from quantized indexes of statistical modelparameters of number-of-successive frames Li corresponding to spectralmodel Si, duration control section 1102 generates number-of-successiveframes L controlled to follow a normal distribution with averageamplitude Lav_i and standard deviation Ldv_i with respect to i=index′(l). In ST1405 random phase generating section 1104 generates randomphases.

[0102] In ST1406 using model number index, (1) obtained in ST1403 andthe model information (average amplitude Sav_i and standard deviationSdv_i of Si) on spectral model Si (i=1, . . . , M) obtained in ST1402,spectrum generating section 1105 generates amplitude spectral timeseries {X′ (n)}, indicated in equation (3), corresponding to index′ (l).

[0103] Herein, it is assumed that S_(index′ (l)) follows a normaldistribution with average amplitude Sav_i and standard deviation Sdv_iwith respect to i=index′ (l), and number-of-successive frames L isgenerated in ST1404. In addition, it may be possible to performsmoothing on the generated amplitude spectral time series so that thespectrum varies smoothly. Further, spectrum generating section 1105 addsrandom phases generated in ST1405 to the amplitude spectral time serieswith a predetermined time duration (=NFRM that is the number of framesof a modeling interval) generated according to transition series {index′(l)}, and thereby generates a spectral time series.

[0104] In ST1407 IFFT section 1106 transforms the generated spectraltime series into a waveform of time domain. In ST1408 overlap addingsection 1107 superimposes overlapping signals between frames. In ST1409the superimposed signal is output as a final synthesized noise signal.

[0105] Thus, in this embodiment, a background noise is represented withstatistical models. In other words, using a noise signal, the noisesignal analysis apparatus (transmitting-side apparatus) generatesstatistical information (statistical model parameters) includingspectral variations in the noise signal spectrum, and transmits thegenerated information to a noise signal synthesis apparatus(receiving-side apparatus). Using the information (statistical modelparameters) transmitted from the noise signal analysis apparatus(transmitting-side apparatus), the noise signal synthesis apparatus(receiving-side apparatus) synthesizes a noise signal. In this way, thenoise signal synthesis apparatus (receiving-side apparatus) is capableof using statistical information including spectral variations in thenoise signal spectrum, instead of using a noise signal spectrum analyzedintermittently, to synthesize a noise signal, and thereby is capable ofsynthesizing a noise signal with less perceptual deterioration. Further,since statistical characteristics of a noise signal of an actualsurrounding noise is expected to be constant over a relatively longperiod (for example, a few seconds to a few tens seconds), it issufficient to set a transmit period of model parameters at such a longperiod. Therefore, an information amount of model parameters of a noisesignal to be transmitted to a decoding side is reduced, and it ispossible to achieve efficient transmission.

Fourth embodiment

[0106] This embodiment explains a case where a speech coding apparatusis achieved using the noise signal analysis apparatus as described inthe third embodiment, and a speech decoding apparatus is achieved usingthe noise signal synthesis apparatus as described in the thirdembodiment.

[0107] The speech coding apparatus according to this embodiment will bedescribed below with reference to FIG. 16. FIG. 16 is a block diagramillustrating a configuration of the speech coding apparatus according tothe fourth embodiment of the present invention. In FIG. 16 an inputspeech signal is input to speech/non-speech determiner 1501, noise coder1502 and noise signal coder 1503.

[0108] Speech/non-speech determiner 1501 determines whether the inputspeech signal is of a speech interval or non-speech interval (intervalwith only a noise), and outputs a determination. Speech/non-speechdeterminer 1501 may be an arbitrary one, and in general, one usingmomentary amounts, variation amounts or the like of a plurality ofparameters such as power, spectrum and pitch period of the input signalto make a determination.

[0109] When speech/non-speech determiner 1501 determines that the inputspeech signal is of speech, speech coder 1502 performs speech coding onthe input speech signal, and outputs coded data to DTXcontrol/multiplexer 1504. Speech coder 1502 is one for speech interval,and is an arbitrary coder that encodes speech with high efficiency.

[0110] When speech/non-speech determiner 1501 determines that the inputspeech signal is of non-speech, noise signal coder 1503 performs noisesignal coding on the input speech signal, and outputs, as coded data,quantized indexes of statistical model parameters corresponding to theinput noise signal. As noise signal coder 1503, the noise signalanalysis apparatus (FIG. 10) as described in the third embodiment isused.

[0111] Using outputs from speech/non-speech determiner 1501, speechcoder 1502 and noise signal coder 1503, DTX control/multiplexer 1504controls information to be transmitted as transmit data, multiplexestransmit information, and outputs the transmit data.

[0112] The speech decoding apparatus according to the fourth embodimentof the present invention will be described below with reference to FIG.17. FIG. 17 is a block diagram illustrating a configuration of thespeech decoding apparatus according to the fourth embodiment of thepresent invention. In FIG. 17 transmit data transmitted from the speechcoding apparatus illustrated in FIG. 16 is input to demultiplexing/DTXcontroller 1601 as received data.

[0113] Demultiplexing/DTX controller 1601 demultiplexes the receiveddata into speech coded data or noise model coded parameters and aspeech/non-speech determination flag required for speech decoding andnoise generation.

[0114] When the speech/non-speech determination flag is indicative ofspeech interval, speech decoder 1602 performs speech decoding using thespeech coded data, and outputs a decoded speech. When thespeech/non-speech determination flag is indicative of non-speechinterval, noise signal decoder 1603 generates a noise signal using thenoise model coded parameters, and outputs the noise signal. As noisesignal decoder 1603, the noise signal synthesis apparatus (FIG. 12) asdescribed in the third embodiment is used.

[0115] Output switch 1604 switches outputs of speech decoder 1602 andnoise signal decoder 1603 corresponding to the result ofspeech/non-speech flag to output as an output signal.

[0116] Operations of the speech coding apparatus and speech decodingapparatus with the above configurations will be described below. First,the operation of the speech coding apparatus will be described withreference to FIG. 18. FIG. 18 is a flow diagram showing the operation ofspeech coding apparatus according to the fourth embodiment of thepresent invention.

[0117] In ST1701 a speech signal for each frame is input. In ST1702 theinput speech signal is determined as a speech interval or non-speechinterval (interval with only a noise), and a determination is output.The speech/non-speech determination is made by arbitrary method, and ingeneral, is made using momentary amounts, variation amounts or the likeof a plurality of parameters such as power, spectrum and pitch period ofthe input signal.

[0118] When the speech/non-speech determination is indicative of speechin ST1702, in ST1703 speech coding is performed on the input speechsignal, and the coded data is output. The speech coding processing iscoding for speech interval and is performed by arbitrary method forcoding a speech with high efficiency.

[0119] Meanwhile, when the speech/non-speech determination is indicativeof non-speech, in ST1704 noise signal coding is performed on the inputspeech signal, and model parameters corresponding to the input noisesignal are output. As the noise signal coding, the noise signal analysismethod as described in the third embodiment is used.

[0120] In ST1705 using outputs of speech/non-speech determination,speech coding and noise signal coding, information to be transmitted astransmit data is controlled (DTX control), and transmit information ismultiplexed. In ST1706 the resultant is output as the transmit data.

[0121] The operation of the speech decoding apparatus will be describedbelow with reference to FIG. 19. FIG. 19 is a flow diagram showing theoperation of the speech decoding apparatus according to the fourthembodiment of the present invention.

[0122] In ST1801 transmit data obtained by coding an input signal at acoding side is received as received data. In ST1802 the received data isdemultiplexed into speech coded data or noise model coded parameters anda speech/non-speech determination flag required for speech decoding andnoise generation.

[0123] When the speech/non-speech determination flag is indicative ofspeech interval, in ST1804 speech decoding is performed using the speechcoded data, and a decoded speech is output. When the speech/non-speechdetermination flag is indicative of non-speech interval, in ST1805 anoise signal is generated using the noise model coded parameters, and anoise signal is output. As the noise signal decoding processing, thenoise signal synthesis method as described in the third embodiment isused.

[0124] In ST1806 corresponding to the result of speech/non-speech flag,an output of speech decoding in ST1804 or of noise signal decoding inST1805 is output as a decoded signal.

[0125] In addition, while the above embodiment explains that a decodedsignal is output while switching a decoded speech signal and synthesizednoise signal corresponding to speech interval and non-speech interval,as another aspect, it may be possible to add a noise signal synthesizedat a non-speech interval to a decoded speech signal also at a speechinterval to output. Further, it may be possible that a coding side isprovided with a means for separating an input speech signal including anoise signal into the noise signal and speech signal with no noise, andusing coded data of the separated speech signal and noise signal, adecoding side adds a noise signal synthesized at a non-speech intervalto a decoded speech signal also at a speech interval to output as in theabove case.

[0126] Thus, according to this embodiment, speech coding enabling codingof a speech signal with high quality is performed at a speech interval,while at a non-speech interval, a noise signal is coded and decodedusing a noise signal analysis apparatus and synthesis apparatus withless perceptual deterioration. It is thereby possible to perform codingof high quality even in circumstances with a background noise. Further,since statistical characteristics of a noise signal of an actualsurrounding noise is expected to be constant over a relatively longperiod (for example, a few seconds to a few tens seconds), it issufficient to set a transmit period of model parameters at such a longperiod. Therefore, an information amount of model parameters of a noisesignal to be transmitted to a decoding side is reduced, and it ispossible to achieve efficient transmission.

[0127] Further, it may be possible to achieve, using software (program),the processing performed by any one of the noise signal analysisapparatuses and noise signal synthesis apparatuses as explained in aboveembodiments 1 and 3 and speech coding apparatuses and speech decodingapparatuses as explained in above embodiments 2 and 4, and store thesoftware (program) in a computer readable storage medium.

[0128] As is apparent from the foregoing, according to the presentinvention, it is possible to synthesize a noise signal with lessperceptual deterioration by representing the noise signal withstatistical models.

[0129] This application is based on the Japanese Patent Applications No.2000-270588 and No. 2001-070148 filed on Sep. 6, 2000 and on Mar. 13,2001 entire contents of which are expressly incorporated by referenceherein.

Industrial Applicability

[0130] The present invention relates to a noise signal analysisapparatus and synthesis apparatus for analyzing and synthesizing abackground noise signal superimposed on a speech signal, and is suitablefor a speech coding apparatus for coding the speech signal using theanalyzing apparatus and synthesis apparatus.

1. A noise signal analysis apparatus comprising: generating means forgenerating a plurality of stationary noise models represented by anamplitude spectral time series following a statistical distribution witha duration of the amplitude spectral time series following anotherstatistical distribution; and processing means for processing a noisesignal as a spectral series statistically transiting between theplurality of stationary noise models.
 2. A noise signal analysisapparatus comprising: frequency transforming means for transforming anoise signal into a signal of frequency domain to calculate a spectrumof the noise signal; storing means for storing a plurality of pieces ofmodel information concerning a spectrum of a stationary noise model;selecting means for selecting, among the plurality of pieces of modelinformation, a piece of model information corresponding to the spectrumof the noise signal based on a predetermined condition; and informationgenerating means for generating statistical parameters concerning astationary noise model and transition probability information that is aprobability of transiting between a plurality of stationery noise modelsusing a timewise series of the selected model information.
 3. A noisesignal synthesis apparatus comprising noise signal generating means forgenerating a noise signal using the statistical parameters and thetransition probability information generated in the noise signalanalysis apparatus according to claim
 2. 4. The noise signal synthesisapparatus according to claim 3, further comprising: transition seriesgenerating means for generating information on a transition series of astationary noise model, using transition probability information that isa probability of transiting between a plurality of stationary noisemodels; duration calculating means for calculating a duration of thestationary noise model using statistical parameters concerning thestationary noise model; storing means for storing model information on aspectrum of the stationary noise model; random phase generating meansfor generating random phases; spectrum generating means for generating aspectral time series using the generated information on the transitionseries of the stationary noise model, the calculated duration, thestored model information on the spectrum of the stationary noise model,and the generated random phases; and inverse frequency transformingmeans for transforming a generated spectrum into a signal of timedomain.
 5. A speech coding apparatus that performs coding on a noisesignal at a non-speech interval of a speech signal, using the noisesignal analysis apparatus according to claim
 2. 6. A speech decodingapparatus that performs decoding on a noise signal at a non-speechinterval of a speech signal, using the noise signal synthesis apparatusaccording to claim
 3. 7. A noise signal analysis apparatus comprising:frequency transforming means for transforming a noise signal into asignal of frequency domain to calculate a spectrum of the noise signal;spectral model parameter calculating/quantizing means for calculatingand quantizing spectral model parameters that are statistical parametersconcerning an amplitude spectral time series of a stationary noise modelto output quantized indexes; and duration model/transition probabilitycalculating/quantizing means for calculating and quantizing statisticalparameters concerning a duration of the amplitude spectral time seriesof the stationary noise model and transition probability informationthat is a probability of transiting between a plurality of stationerynoise models to output quantized indexes.
 8. The noise signal analysisapparatus according to claim 7, wherein the spectral model parametercalculating/quantizing means comprises: power normalizing means fornormalizing power of an amplitude spectrum of an input noise signalobtained in the frequency transforming means; storing means for storingtypical vector sets of amplitude spectra each representing a noisesignal; clustering means for clustering amplitude spectra with powernormalized obtained in the power normalizing means, using the typicalvector sets stored in the storing means; each-cluster average spectrumcalculating means for selecting a plurality of clusters in descendingorder of frequency of selection for each modeling interval of the inputnoise signal, and calculating for each cluster an average spectrum of aninput amplitude spectrum belonging to the selected cluster; modelinginterval average power quantizing means for calculating average power ofa modeling interval of the input noise signal to quantize; and errorspectrum/power correction value quantizing means for quantizing an errorspectrum for each cluster and a power correction value for the averagepower of the modeling interval, using the average spectrum of eachcluster obtained in the each-cluster average spectrum calculating meansand quantized average power of the modeling interval obtained in themodeling interval average power quantizing means.
 9. A noise signalsynthesis apparatus comprising noise signal generating means forgenerating a noise signal using the quantized indexes generated in thenoise signal analysis apparatus according to claim
 7. 10. The noisesignal synthesis apparatus according to claim 9, further comprising:transition series generating means for generating information on atransition series of a stationary noise model, using quantized indexesof transition probability information that is a probability oftransiting between a plurality of stationary noise models; durationcalculating means for calculating a duration of the stationary noisemodel using quantized indexes of statistical parameters concerning theduration; spectral model parameter decoding means for decoding thespectral model parameters using quantized indexes of the spectral modelparameters; random phase generating means for generating random phases;spectrum generating means for generating a spectral time series usingthe generated information on the transition series of the stationarynoise model, the calculated duration, the spectral model parameters, andthe generated random phases; and inverse frequency transforming meansfor transforming a generated spectrum into a signal of time domain. 11.A speech coding apparatus that performs coding on a noise signal at anon-speech interval of a speech signal, using the noise signal analysisapparatus according to claim
 7. 12. A speech decoding apparatus thatperforms decoding on a noise signal at a non-speech interval of a speechsignal, using the noise signal synthesis apparatus according to claim 9.13. A noise signal analysis method comprising: a frequency transformingstep of transforming a noise signal into a signal of frequency domain tocalculate a spectrum of the noise signal; a storing step of storing aplurality of pieces of model information concerning a spectrum of astationary noise model; a selecting step of selecting, among theplurality of pieces of model information, a piece of model informationcorresponding to the spectrum of the noise signal based on apredetermined condition; and an information generating step ofgenerating statistical parameters concerning a stationary noise modeland transition probability information that is a probability oftransiting between a plurality of stationery noise models using atimewise series of the selected model information.
 14. A noise signalsynthesis method comprising: a transition series generating step ofgenerating information on a transition series of a stationary noisemodel, using the transition probability information that is aprobability of transiting between the plurality of stationary noisemodels generated by the noise signal analysis method according to claim13; a duration calculating step of calculating a duration of thestationary noise model using statistical parameters concerning thestationary noise model; a storing step of storing model information on aspectrum of the stationary noise model; a random phase generating stepof generating random phases; a spectrum generating step of generating aspectral time series using the generated information on the transitionseries of the stationary noise model, the calculated duration, thestored model information on the spectrum of the stationary noise model,and the generated random phases; and an inverse frequency transformingstep of transforming a generated spectrum into a signal of time domain.15. A noise signal analysis method comprising: a frequency transformingstep of transforming a noise signal into a signal of frequency domain tocalculate a spectrum of the noise signal; a spectral model parametercalculating/quantizing step of calculating and quantizing spectral modelparameters that are statistical parameters concerning an amplitudespectral time series of a stationary noise model to output quantizedindexes; and a duration model/transition probabilitycalculating/quantizing step of calculating and quantizing statisticalparameters concerning a duration of the amplitude spectral time seriesof the stationary noise model and transition probability informationthat is a probability of transiting between a plurality of stationerynoise models to output quantized indexes.
 16. The noise signal analysismethod according to claim 15, wherein the spectral model parametercalculating/quantizing step comprises: a power normalizing step ofnormalizing power of an amplitude spectrum of an input noise signalobtained in the frequency transforming step; a storing step of storingtypical vector sets of amplitude spectra each representing a noisesignal; a clustering step of clustering amplitude spectra with powernormalized obtained in the power normalizing step, using the typicalvector sets stored in the storing step; an each-cluster average spectrumcalculating step of selecting a plurality of clusters in descendingorder of frequency of selection for each modeling interval of the inputnoise signal, and calculating for each cluster an average spectrum of aninput amplitude spectrum belonging to the selected cluster; a modelinginterval average power quantizing step of calculating average power of amodeling interval of the input noise signal to quantize; and an errorspectrum/power correction value quantizing step of quantizing an errorspectrum for each cluster and a power correction value for the averagepower of the modeling interval, using the average spectrum of eachcluster obtained in each-cluster average spectrum calculating step andquantized average power of the modeling interval obtained in themodeling interval average power quantizing step.
 17. A noise signalsynthesis method comprising: a transition series generating step ofgenerating information on a transition series of a stationary noisemodel, using quantized indexes of transition probability informationthat is a probability of transiting between a plurality of stationarynoise models generated by the noise signal analysis method according toclaim 15; a duration calculating step of calculating a duration of thestationary noise model using quantized indexes of statistical parametersconcerning the duration; a spectral model parameter decoding step ofdecoding the spectral model parameters using quantized indexes of thespectral model parameters; a random phase generating step of generatingrandom phases; a spectrum generating step of generating a spectral timeseries using the generated information on the transition series of thestationary noise model, the calculated duration, the spectral modelparameters, and the generated random phases; and an inverse frequencytransforming step of transforming a generated spectrum into a signal oftime domain.
 18. A program for operating a computer to have functionsof: frequency transforming means for transforming a noise signal into asignal of frequency domain to calculate a spectrum of the noise signal;storing means for storing a plurality of pieces of model informationconcerning a spectrum of a stationary noise model; selecting means forselecting, among the plurality of pieces of model information, a pieceof model information corresponding to the spectrum of the noise signalbased on a predetermined condition; and information generating means forgenerating statistical parameters concerning a stationary noise modeland transition probability information that is a probability oftransiting between a plurality of stationery noise models using atimewise series of the selected model information.
 19. A program foroperating a computer to have functions of: transition series generatingmeans for generating information on a transition series of a stationarynoise model, using the transition probability information that is aprobability of transiting between a plurality of stationary noisemodels; duration calculating means for calculating a duration of thestationary noise model using statistical parameters concerning thestationary noise model; storing means for storing model information on aspectrum of the stationary noise model; random phase generating meansfor generating random phases; spectrum generating means for generating aspectral time series using the generated information on the transitionseries of the stationary noise model, the calculated duration, thestored model information on the spectrum of the stationary noise model,and the generated random phases; and inverse frequency transformingmeans for transforming a generated spectrum into a signal of timedomain.